# **IJESRT**

### INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

### Design of Fir Filter Using Area and Power Efficient Truncated Multiplier R.Ambika<sup>\*1</sup>, S.Siva Ranjani<sup>2</sup>

<sup>\*1</sup>Assistant Professor, <sup>2</sup>PG Student, Department of ECE PSNA College of Engineering and Technology.

Tamilnadu,on, India

ambika.theni@gmail.com

#### Abstract

This paper describes the design of Finite Impulse Response (FIR) using the rounded truncated multiplier which offers diminution in area, delay, and power. This anticipated method finally reduces the number of full adders and half adders during the tree reduction in the multiplier block. LSB and MSB is the output form of this multiplier. Deletion, reduction, truncation, rounding and final addition are the operations performed to compress the LSB part. When this scheme is followed the truncation error does not exceeds 1 ulp (unit of least position). So it does not necessitate any error compensation circuits, and the final output will be precised. The proposed filter using truncated multiplier will be designed using VHDL and simulated using ISE Simulator (ISIM).It achieves best area and power result when compared with previous FIR design approaches.

General Terms: Digital signal processing, bit width optimization, VLSI design.

Keywords: Finite impulse response (FIR) filter, truncated multiplier, partial product(PP),carry propagate adder(CPA), flip-flop,ulp

#### Introduction

Finite impulse response (FIR) digital filter is a major component in digital signal processing(DSP) and applications of communication systems. As it offers limited area and power it is widely used in many portable applications. The two basic FIR structures for a linear phase FIR filter are direct form and transposed form as shown in Figure 1.As shown in Fig 1(a), which is the direct form, the multiple constant multiplication (MCM), multiple constant multiplication and accumulation (MCMA)where the concurrent multiplications [1] is performed which invovles the individual delayed signals and corresponding filter coefficients which is followed by accumulation of all the products. Delayed input signals x[n-i] and coefficients  $a_i$  are the operands of the multipliers in MCMA. The operands of the multipliers in the MCM module are the current input signal x[n] and coefficients as shown in the transposed form in Fig 1(b). The results of individual constant multiplications go through structure adders (SAs) and delay elements. The digital FIR filters hardware implementations can be classified into two categories: multiplierless based and memory based.





Fig 1 :Structures of linear-phase FIR filters: (a) Direct form and (b) transposed form.

MCM with shift-and-add operations is realiazed by the multiplierless based designs. The common sub operations is shared using canonical signed digit (CSD) recoding and common sub expression elimination (CSE) minimize the cost of adder in MCM. Two types of approaches [1] are followed in memory based FIR designs: lookup table (LUT) methods and distributed arithmetic (DA) methods. The LUT based design [3] stores odd multiples of the input signal in ROMs in which constant multiplications in MCM is performed. The inner product computation in FIR filter for the bit level partial results is accumulated by the DA based design[4].In this brief, the FIR filter with a new truncated multiplier design is presented which achieves a precised and rounded results. As this method considers the PP reduction, truncation, and rounding the final truncated product satisfies the precision requirement during the design of FIR filter with parallel truncated multiplier.

#### **Tree Reduction of Parallel Multipliers**

PP generation, PP reduction, and final carry propagate addition are the three main steps in design of parallel tree multiplier. PP bits from the multiplicand and the multiplier are produced by PP generation. PP reduction compresses the number of PPs to two, which is summed up by the final addition. Wallace tree and Dadda tree reductions are the two widely used reduction methods.Wallace tree reduction compresses the PPs [5] as early as possible, whereas the Dadda reduction performs compression whenever necessary without increasing the number of carry save addition (CSA) levels. A New reduced area (RA) reduction method [5] has proposed so that the bit width of the final carry-propagate adder (CPA) is minimized.

#### Table 1 Specifications Of The Two Multiplier Schemes

| Dadda tree   | 38 FAs | 8 HAs |
|--------------|--------|-------|
| Wallace tree | 35 FAs | 7 HAs |

From the above table it is patent that Wallace multiplier contains reduced number of FA and HA. Hence the truncation operation can be made in the Wallace multiplier to design the FIR filter using the truncated Wallace multiplier.

# Proposed Filter Design with Truncated Multiplier

The FIR filter design in the direct form as in Fig. 1 (a) where the MCMA module sums up all the products as  $a_i \times x[n-i]$ . The efficient method of collecting all the PPs into a single PPB matrix with carry save addition is performed where the height of the matrix is reduced to two. Finally the addition is performed by a carry propagate adder in this multiplier.

## Truncated Multiplier With PP Truncation And Compression

The proposed truncated multiplier consists of several operations such as deletion, reduction, truncation, rounding, and final addition [1]. The first step of deletion operation is performed which removes all the unnecessary PP bits. Those bits are those which are need not to be generated, as shown by the blue dots in Fig 2. An example of  $8 \times 8$ unsigned fractional multiplication is considered here which is in the form of eight product bit truncation. The next step is deletion, where as many possible PP bits are deleted till the deletion error of E<sub>D</sub>' is bounded by -1/2 ulp  $\leq E_D' \leq 0$ . The correction bias constant injection [8] of a of 1/4 ulp leads the deletion error as -1/4 ulp  $\leq E_D \leq 1/4$  ulp. Per-column reduction is performed after the deletion of PP bits, and two rows of PP bits is generated.







#### Fig 2 : 8 × 8 truncated multiplication with eight product bits truncated. (a) Deletion, reduction, and truncation of PP bits. (b) Deletion, reduction, truncation, and rounding plus final addition.

Next the truncation is performed in which the first row of n - 1 bits from column 1 to column n -1 is removed, as shown by the crossed red dots in Fig. 2 (a). Truncation error of -1/2 ulp  $< E_T' \le 0$  is introduced by truncation. The truncation error is, adjusted by injection of another bias constant [6] of 1/4 ulp where the error is bounded by -1/4 ulp  $< E_T$  $\leq$  1/4 ulp. The completion of deletion, reduction, and truncation, the addition of PP bits is done using a CPA which generates the final product of P bits, as shown in Fig 2(b). The bits left after deletion and truncation can be safely removed before final addition as they do not affect the carry bit to column n + 1 during the rounding process. A final bias constant [8] of 1/2 ulp is added before addition by CPA to achieve the rounding error as -1/2 ulp  $< E_R$ '  $\leq 1/2$  ulp.The rounding process involves the removal of bit at column n after the final CPA .Thus the faithfully rounded truncated multiplier has the total error of  $-ulp < E = (E_D + E_T + E_R) \le ulp$ . As the total error is no more than 1 ulp [6] the proposed truncated multiplier design achieves faithful rounding. Furthermore, the three bias constants  $1_D$ ,  $1_T$ , and  $1_R$ , finally can be collected and added as a single

constant bit column n + 1, as the overall height of the PP matrix should not be increased.

#### Delay Element

The design of FIR filter with direct structure as shown in Fig 1 (a) recquires the multiplied output to be delayed using the D flip flop which is widely used in many applicatoins. The D flip flop captures the multiplied output from truncated multiplier at a particular portion of clock cycle. The truth table is as follows. At the rising edge of clock the captured value is given to the final CPA. At falling edge of clock the output of truncated multiplier does not delayed i.e. which does not change.

| Table 2 | Truth | Table | Of | D Flip | Flop |
|---------|-------|-------|----|--------|------|
|---------|-------|-------|----|--------|------|

| Clock           | D | Qnext |
|-----------------|---|-------|
| Rising edge     | 0 | 0     |
| Rising edge     | 1 | 1     |
| Non-Rising edge | Х | Q     |

The FIR filter designs based on the transposed structure needs lesser number of adders as it leads to 20% reduction of the total area. But the area of DFFs in the transposed forms is larger. As an example [1]consider the 121-tap filter with 19-bit coefficients and 12-bit signals. The direct form requires only 12\*120 = 1440 flip-flops, while the transposed form recquires 3701 flip-flops.Thus the direct form of FIR filter is designed here which recquires lesser number of flip flops.

#### Carry Look Ahead Adder

The completion of FIR filter design requires the addition of two consecutive direct and delayed output by using the Carry look ahead adder. This carry-look ahead adder improves speed of the FIR filter by reducing the amount of time in determining the carry bits. The look ahead adder consists of generate and propagate terms given as follows. Consider here the inputs as A and B

#### Pi =Ai ⊕Bi

#### Gi =AiBi

The output sum and carry can be defined as Si =Pi ⊕Ci

 $G_i$  is known as the carry generate signal where a carry  $C_{i+1}$  is generated when carry generate signal is one.  $P_i$  is known as the carry propagate signal. Whenever carry propagate signal is 1 the input carry Ci is propagated to the output carry which shows as  $C_{i+1}=C_i$ . The carries Ci and  $C_{i+1}$  are created after the generation of carry generate and carry propagate signal.

#### **Implementation Results**

The FIR filter is designed using ISIM simulator with VHDL code and it is also implemented using Xilinx ISE Design Suite 13.2 in FPGA. The simulated screen shot is given in Fig 3.



**Fig 3: Screen Shot of Simulated Filter Output** The given input signal x is multiplied with constants a0,a1,a2 and a3 which is delayed and the final addition provides the recquired output y. After the above mentioned simulation by ISIM it is synthesized.

| Design Overview                           | ^  | tranform Project Status                                                                            |                                                            |      |                              |                      |                     |                   |           |  |
|-------------------------------------------|----|----------------------------------------------------------------------------------------------------|------------------------------------------------------------|------|------------------------------|----------------------|---------------------|-------------------|-----------|--|
| - Dis Properties                          |    | Project File: Inncated fir filteruise                                                              |                                                            |      | Parser Frons:                |                      |                     | No Emors          |           |  |
| Module Level Utilization                  |    | Madela Nama                                                                                        |                                                            |      | Texplomentation States       |                      |                     | Neuroland Partial |           |  |
| - 📄 Timing Constraints                    |    | Piodule Name:                                                                                      | cranrorm                                                   |      | +Errors:                     |                      |                     | Mates ar          | IC KOUCED |  |
| - 🖹 Pinout Report                         |    | Target Device:                                                                                     | xt3s250e-5pq208                                            |      |                              |                      |                     | No Errors         |           |  |
| Clock Report                              |    | Product Version:     ISE 13.2     •Warnings:       Design Goal:     Balanced     •Routing Results: |                                                            |      | •Warnings:                   |                      | 46 Warnings (Dinew) |                   |           |  |
| Static Timing Static Timing Static Timing | E. |                                                                                                    |                                                            |      | Al Signals Completely Routed |                      |                     |                   |           |  |
| - 📄 Parser Messages                       |    | Design Strategy:                                                                                   | regy: <u>Xinx Default (unlocked)</u> • Timing Constraints: |      |                              | All Constraints Met  |                     |                   |           |  |
| Synthesis Messages Translation Messages   |    | Environment:                                                                                       | System Settings                                            |      |                              | +Final Timing Score: |                     | 0 (Timing Report) |           |  |
| Map Messages                              |    |                                                                                                    |                                                            |      |                              |                      |                     |                   |           |  |
| Place and Route Messages Twing Messager   |    | Device Utilization Summary                                                                         |                                                            |      |                              |                      |                     |                   |           |  |
| - Bitgen Messages                         |    | Logic Utilization                                                                                  |                                                            | Used |                              | Available            | Utilization Note(s  |                   | Note(s)   |  |
| All Implementation Messages               |    | Number of Slice Flip Flops                                                                         |                                                            |      | 74                           | 4,896                |                     | 1%                |           |  |
| - B Synthesis Report                      |    | Number of 4 input LUTs                                                                             |                                                            |      | 185                          | 4,896                |                     | 3%                |           |  |
| Translation Report                        | •  | Number of occupied Slices                                                                          |                                                            |      | 100                          | 2,448                | 4%                  |                   |           |  |
| Design Properties                         |    | Number of Slices containing only                                                                   | related logic                                              |      | 100                          | 100                  |                     | 100%              |           |  |
| Optional Design Summary Contents          |    | Number of Sices containing unre                                                                    | lated logic                                                |      | 0                            | 100                  |                     | 0%                |           |  |
| - Show Clock Report                       |    | Total Number of 4 input LUTs                                                                       |                                                            |      | 185                          | 4,896                | 3%                  |                   |           |  |
| Show Warnings                             |    | Number of bonded 108s                                                                              |                                                            |      | 20                           | 158                  | 12%                 |                   |           |  |
| Show Errors                               |    | Number of BUFGMUXs                                                                                 |                                                            |      | 1                            | 24                   | 4%                  |                   |           |  |
|                                           |    | Average Fanout of Non-Clock Nets                                                                   |                                                            |      | 4.01                         |                      |                     |                   |           |  |
|                                           |    |                                                                                                    |                                                            |      |                              |                      |                     |                   |           |  |

Fig 4 : Area Analysis Of FIR filter



#### Fig 5 : Power Recquirement Of FIR filter

The area, power and delay analysis is shown in the fig 4.5.6 below.

| elay:<br>Source:<br>Destination: | 12.850ns<br>x<8> (PA<br>y<11> (Pi | (Levels<br>D)<br>AD) | of Logi | c = 9)                                            |
|----------------------------------|-----------------------------------|----------------------|---------|---------------------------------------------------|
| Data Path: x<8> t                | :o y<11>                          |                      |         |                                                   |
|                                  |                                   | Gate                 | Net     |                                                   |
| Cell:in->out                     | fanout                            | Delay                | Delay   | Logical Name (Net Name)                           |
| IBUF:I->0                        | 43                                | 1.106                | 1.145   | <pre>x 8 IBUF (b1/f6/Mxor s Result and0000)</pre> |
| LUT3:I1->0                       | 2                                 | 0.612                | 0.532   | b20/carry out121 (b20/carry out bdd20)            |
| LUT3:I0->0                       | 2                                 | 0.612                | 0.383   | b20/carry out111 (b20/carry out bdd18)            |
| LUT4:I3->0                       | 4                                 | 0.612                | 0.502   | b20/carry out101 (b20/carry out bdd16)            |
| LUT4:I3->0                       | 3                                 | 0.612                | 0.481   | b20/carry out81 (b20/carry out bdd12)             |
| LUT3:I2->0                       | 3                                 | 0.612                | 0.481   | b20/carry out61 (b20/carry out bdd8)              |
| LUT3:I2->0                       | 2                                 | 0.612                | 0.410   | b20/carry out41 (b20/carry out bdd4)              |
| LUT3:I2->0                       | 1                                 | 0.612                | 0.357   | b20/sum<11>1 (y 11 OBUF)                          |
| OBUF:I->0                        |                                   | 3.169                |         | Y_11_OBUF (Y<11>)                                 |
| Total                            |                                   | 12.850ns             | (8.559  | ns logic, 4.291ns route)                          |
| Total                            |                                   | 12.850ns             | (8.559  | ns logic, 4.291ns route)<br>logic, 33.4% route)   |

Total REAL time to Xst completion: 6.00 secs Total CPU time to Xst completion: 5.39 secs

#### Fig 6 Delay Analysis

The designed FIR filter uses 74 flip flops which is only 1% among the available flipflops with power consumption of about 2.141W and delay is 12.850ns.

#### **Conclusion and Future Work**

This paper describes the design and implementation of low cost FIR filter design using truncated multiplier. The following table provides the comparison between different filter structures using wallace and truncated wallace multiplier. As the number of full adders does not contribute to any area reduction FIR filter design using truncated multiplier which contains lesser number of full adders offers smallest area cost and power consumption. This filter design can be extended by using the Montgomery multiplier.

| Ta | ble 3 | 8 Comp | arat | ive R | esult |  |
|----|-------|--------|------|-------|-------|--|
|    | ЪT    | 1      | ЪT   | 1     | C     |  |

| FIR Filter | Number | Number of | Structure |
|------------|--------|-----------|-----------|
| design     | of FA  | HA        |           |
| method     |        |           |           |
| Using      |        |           |           |
| Wallace    | 69     | 6         | Direct    |
| Multiplier |        |           |           |
| Using      |        |           |           |
| Truncated  | 61     | 1         | Direct    |
| Multipleir |        |           |           |

The proposed design [8] involves significantly less area-delay and power-delay complexities compared with the best of the existing designs.

**References** 

- [1] Shen Fu Hsiao, Jun Hong Zhang Jian, and Ming Chih Chen, "Low Cost FIR Filter Designs Based on Faithfully Rounded Truncated Multiple Constant Multiplication/Accumulation,"IEEE transactions on circuits and systems II: Exp.Briefs, vol. 60, No. 5, May 2013
- [2] H.J. Ko and S.-F. Hsiao, "Design and application of faithfully rounded and truncated multipliers with combined deletion, reduction, truncation, and rounding," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 5, pp. 304–308, May 2011.
- [3] P. K. Meher, "New approach to look-uptable design and memory-based realization of FIR digital filter," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 592– 603, Mar. 2010.
- [4] S. Hwang, G. Han, S. Kang, and J.-S. Kim, "New distributed arithmetic algorithm for low-power FIR filter implementation," IEEE Signal Process. Lett., vol. 11, no. 5, pp. 463– 466, May 2004.
- [5] Ron S. Waters, Member, IEEE, and Earl E.Swartzlander, Jr.,Fellow, IEEE, "A Reduced Complexity Wallace Multiplier Reduction", IEEE Transactions On Computers, Vol. 59, No. 8, August 2010
- [6] M. J. Schulte and E. E. Swartzlander, Jr., "Truncated multiplication with correction constant," in VLSI Signal Processing VI. Piscataway, NJ:IEEE Press, 1993, pp. 388– 396.
- [7] C.-H. Chang, J. Chen, and A. P. Vinod, "Information theoretic approach to complexity reduction of FIR filter design," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 8, pp. 2310–2321, Sep. 2008.
- [8] Jiafeng Xie, Jian jun He, and Pramod Kumar Meher, "Low Latency Systolic Montgomery Multiplier for Finite Field GF(2<sup>m</sup>) Based on Pentanomials" IEEE Transactions On Very Large Scale Integration (VLSI) Systems, Vol. 21, No. 2, February 2013.